Model Selection

efficient inference

# efficient inference

Phi 4 Mini Instruct Float8dq

The Phi-4-mini-instruct model undergoes float8 dynamic activation and weight quantization via torchao, achieving 36% VRAM reduction and 15-20% speed improvement on H100 with minimal accuracy impact.

Large Language Model

Transformers Other

Mistral Small 3.1 24B Instruct 2503 GPTQ 4b 128g

This model is an INT4 quantized version of Mistral-Small-3.1-24B-Instruct-2503, using the GPTQ algorithm to reduce weights from 16-bit to 4-bit, significantly decreasing disk size and GPU memory requirements.

Large Language Model

Omost Phi 3 Mini 128k 8bits

Omost's phi-3-mini model with 128k context length, utilizing fp8 precision.

Large Language Model

Omost Llama 3 8b 4bits

Omost's released llama-3 model, featuring 8k context length and nf4 quantization.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase